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Abstract 

The  alternating  direction  method  of  multipliers,  and  extensions  thereof,  can  be 
used  to  derive  potentially  parallel  decomposition  algorithms  for  convex  program¬ 
ming.  This  paper  focuses  on  two  kinds  of  problems,  monotropic  programs  (includ¬ 
ing  linear  programs)  and  block-separable  problems.  For  block-separable  problems, 
the  algorithm  obtained  bears  some  resemblance  to  an  earlier  method  due  to  Spin- 
gam,  but  solves  a  larger  number  of  simpler  subproblems  at  each  iteration.  Its  fun¬ 
damental  operation  is  projection  onto  the  epigraph  of  a  convex  function.  For 
monotropic  programs,  one  obtains  a  compact  method  that  has  some  interesting 
properties  when  specialized  to  linear  programming,  and,  for  quadratic  problems, 
has  been  shown  to  be  competitive  in  practice  in  a  massively  parallel  environment. 
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1.  Introduction 

Consider  the  convex  programming  problem 

minimize  /(x)  +  g(z) 
such  that  Mx  =  z  , 

where/;  SR'*  — >  (-<»,  +©©]  and  g:  9i^  — >  (-©o,  +oo]  are  closed  proper  convex,  and  M  is  some  sxn 
matrix.  The  alternoxing  direction  method  of  multipliers  for  (1)  is  the  given  by  the  recursions 

x^+^  =  arg^|/(x)  +  ^p*,Mxy+-|-||Mx-z^||^| 

=  argmn  |g(z)  -  ^p^,z^  +  •|||mx^‘^^  -  z||^|  (2) 

P*+i  =  P*+A(Mx^+1-z*+1)  , 

where  A  is  a  given  positive  scalar.  This  method  resembles  the  conventional  Hestenes-Powell 
method  of  multipliers  for  (1)  (see  for  example  [1]),  except  that  it  minimizes  the  augmented 
Lagrangian  function 

Lyfx,z,p)  =/(x)  +  g(z)  +  <p,Mx-z>+|||Mx-z||^  (3) 

first  with  respect  to  x,  and  then  with  respect  to  z,  rather  than  with  respect  to  both  x  and  z  simulta¬ 
neously.  Thus,  X  and  z  are  effectively  decoupled,  and  the  method  avoids  difticulties  arising  from 
the  non-separable  cross  term  2z''’Mx  arising  from  the  quadratic  penalty  in  (3).  However,  the 
methodstill  retains  many  of  the  theoretical  convergence  advantages  of  the  method  of  multipliers 
over  algorithms  using  the  simple  Lagrangian  L^fx,  z,  p)  =  /(x)  +  g{z)  +  <p,  Mx  -  z>  . 

The  alternating  direction  method  of  multipliers  was  first  introduced  by  Glowinski  and  Marroco 
[1 1],  and  by  Gabay  and  Mercier  [9].  Fortin  and  Glowinski  [7]  furthered  the  theory  of  the  method, 
and  Gabay  [8]  demonstrated  its  relationship  to  the  Douglas-Rachford  splitting  procedure  for 
monotone  operators  [13].  Eckstein  and  Bertsekas  [4]  have  combined  this  relationship  with  some 
proximal  point  analysis  to  obtain  a  generalization  of  the  algorithm.  A  convergence  theorem  for 
this  generalized  method  is  restated  in  Section  2  of  this  paper,  and  forms  the  basis  of  the  following 
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analysis.  Alternative  analyses  of  the  alternating  direction  method  of  multipliers  may  be  found  in 
[10]  and  [2,  pp.  253-261]. 

This  paper  presents  two  ways  of  transforming  convex  programs  into  the  form  (1),  and  then  studies 
the  algorithms  resulting  from  applying  (2)  and  its  generalizations.  In  both  cases,  significant 
decomposition  of  the  original  problem  and  extensive  opportunities  for  parallelism  result. 

The  first  transformation,  analyzed  in  Section  3  of  this  paper,  is  applicable  to  monotropic  programs 
[16],  that  is,  separable  convex  programs  having  only  linear  constraints.  It  results  in  an  algorithm 
sometimes  called  the  “alternating  step  method.”  This  algorithm  has  an  interesting  interpretation 
when  applied  to  linear  programming,  and  has  been  shown  to  be  computationally  competitive  for 
certain  kinds  of  quadratic-cost  problems  on  massively  parallel  computer  architectures  [6].  No 
complete  derivation  of  it  has  been  published,  although  a  preliminary  analysis  (based  on  the  cur¬ 
rent  author’s  early  research)  appears  in  [2,  p.  254]. 

The  second  transformation,  presented  in  Section  4,  is  for  block-separable  problems  of  the  sort 
studied  by  Spingam  [18].  The  approach  given  here  yields  a  method  that  solves  a  much  larger 
number  of  simpler  subproblems  at  each  iteration,  each  with  far  fewer  points  of  nondififerentiabil- 
ity.  This  algorithm  is  called  an  epigraphic  projection  method  because  the  basic  calculation  in  each 
subproblem  is  projection  onto  the  epigraph  [14,  p.  23]  of  a  convex  function.  A  preliminary, 
unpublished  analysis  of  this  epigraphic  projection  method  appears  in  [3]. 

Section  5  of  this  paper  gives  some  closing  observations. 

2.  A  Generalized  Alternating  Direction  Method  of  Multipliers 

In  this  paper,  ||  ||  denotes  the  usual  euclidean  norm  in  9t",  and  a^b  will  be  a  shorthand  for 
II  a  —  b|l  <  e.  Convex  analysis  notation  will  be  adopted  from  [14]. 

The  convex  program  (1)  can  also  be  expressed  as 
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(4) 


minimize  /(x)  +  ^(Mx) 
xe9l" 

and,  attaching  a  vector  of  multipliers  p  €  9?"  to  the  constraints  in  (1),  one  obtains  a  symmetrical 
dual  problem 

minimize  /*(-M'^p)+g*(p)  ,  (5) 

pe9t^ 

where  the  asterisk  denotes  the  convex  conjugacy  operation  [14].  A  pair  (x*,  p*)  e  91”  x  91^  is  said 
to  be  a  Kuhn-Tucker  pair  for  (1)  or  (4)  if  -  M''’p*  e  ^(x*)  and  p*  €  9g(Mx*),  where  “9” 
denotes  the  subgradient  mapping.  It  is  a  basic  exercise  in  convex  analysis  to  show  that  (x*,  p*)  is 
a  Kuhn-Tucko-  pair  if  and  only  if  x*  is  optimal  for  (1)  and  (4),  and  p*  is  optimal  for  (5). 

We  now  present  a  convergence  theorem  for  a  generalization  of  the  alternating  direction  method  of 
multipliers  (2);  a  proof  may  be  found  in  [4]. 

Theorem  1.  Consider  a  convex  program  in  the  form  (1)  or  (4),  where  M  has  full  column  rank.  Let 
pO,  z®  e  91-^,  and  suppose  we  are  given  some  scalar  A  >  0,  nonnegative  summable  sequences  [pi^} 
and  { v^},  and 

iPk)k  =  0  £  (0*2) ,  0  <  infp^t  ^  supp^  <  2  . 

k>0  k>0 

Suppose  (x^),  {z^j,  and  {p^}  conform,  for  all  it  >  0,  to 


=  argminj/(x)  +  (p^,Mx\+-4||Mx-z*^ 

1 

(6) 

X  1  '  1 

J 

=  argmin|g(z)-^p*,z^  +  -|||p^Mx*^'^^  -t-(l-p^)z*  -z||^| 

(7) 

pi+1  =  pfc  +  A(pjfeMx^+l  +  (1  —  Pk)^^  — 

(8) 

Then  if  (1)  has  a  Kuhn-Tucker  pair,  {x^}  converges  to  a  solution  of  (1)  and  {p*}  converges  to  a 
solution  of  the  dual  problem  (5).  Furthermore,  {z*}  converges  to  m|  lim  x*^  .  If  (5)  has  no  opti- 

\k->oo  J 
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mal  solution,  tiien  at  least  one  of  the  sequences  {p^}  or  {z^}  is  unbounded. 

The  algorithm  (6)-(8)  reduces  to  the  alternating  direction  method  of  multipliers  in  the  case  that 
=  1  for  all  k,  and  all  minimizations  are  performed  exactly. 

3.  Monotropic  Programming 

A  monotropic  program  [16]  is  a  convex  programming  problem  taking  the  canonical  form 

n 

minimize  ^hj(Xj) 

j=i  (9) 

subject  to  Ax  =  b 

where  the  hf.  SR  (-«>,  are  closed  proper  convex,  A  is  a  real  mxn  matrix,  and 

b  e  SR"*. 

Since  the  hj  can  take  on  the  value  +  «»,  they  may  impose  implicit  interval  constraints  on  the  xj. 
Thus,  by  using  slack  variables  and  manipulating  of  the  hj,  any  convex  optimization  problem  with 
a  separable  lower  semicontinuous  objective  and  a  finite  number  of  linear  equality  and  inequality 
constraints  can  be  converted  to  the  form  (9). 

Given  some  monotropic  program  (9),  define  d{i),  the  degree  of  constraint  i,  to  be  the  number  of 
nonzero  elements  in  row  i  of  A.  If  A  is  the  node-arc  incidence  matrix  of  a  network  or  graph,  then 
this  definition  agrees  with  the  usual  notion  of  the  degree  of  node  i  in  the  corresponding  graph.  Let 
A.j  denote  column  j  of  A,  and  let  Ai.  denote  tow  i  of  A.  The  surplus  or  residual  r,(x)  of  constraint 
i  of  the  system  Ax  =  b,  with  respect  to  the  primal  variables  x,  is  bi  -  Aj.  x. 

Now  consider  the  conversion  of  (9)  to  the  form  (1).  Here,/will  be  defined  on  SR”  and  g  on  SR"*”. 
Index  the  components  of  vectors  z  e  SR”*”  as  Zy,  where  1  <  i  <  m  and  l^j<n.  Then  let 

/(X)  =  X  hjixj)  (10) 

j  =  l 
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C  =  ^z€SR'”" 


(11) 


n 

^  1,...,/?I,  Qy  —  0  - 1'  Zy  0  ^ 

7=1 


g(7)  = 


zeC 
z«  C 


(12) 


diag(Ai.) 

M  = 

diag(A2.) 

diag(A;„.) 

(13) 


where,  for  any  vector  v  e  St'*,  diag(v)  denotes  the  /i  x  n  matrix  D  with  entries  djj  =  vj  along  the 
diagonal,  and  zeroes  elsewhere.  It  is  easily  confirmed  that,  under  (10)-(13),  problems  (1)  and  (9) 
are  equivalent.  Furthermore,  unless  A  has  a  column  that  consists  entirely  of  zeroes,  the  matrix  M 
of  (13)  has  full  column  rank. 


We  now  apply  the  generalized  alternating  direction  method  (6)-(8)  to  this  reformulation  of  the 
monotropic  program.  By  the  separability  off  and  the  form  of  M,  (6)  can  now  be  decomposed  into 
n  independent  computations  of  the  form 


*  argmin 

£k  Xj 


hj(Xj)  + 


m 

]L(^7-*-7 ■“ 4) 


Vi=l 


where  Now  consider  the  computation  of  z^'*’^  in  (7).  For  any  z  eSt"***,  and 

1  <i<m,  let  z/  €  St**  be  the  subvector  of  z  with  components  Zy,^l,...,/i.  Let 
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n 

c,=< 

z,e9l" 

^Zjy  =  hj,  Oij  =0=>  Zij  =  0 

7=1  J 

so  that  C  =  Cl  X ...  X  Cm-  Finding  can  be  reduced  to  m  independent  calculations 


Ml 


~  argmin 

Zj-eCj 

•\fm 


-(pf  >*i)+ 

7=1 


To  solve  this  problem  exacdy  for  each  z,  we  attach  a  single  Lagrange  multiplier  ?tf to  the  con¬ 
straint  - 1  ^ij  ~  Some  algebraic  manipulations  of  the  Karush/Kuhn-Tucker 

conditions  yield 


'  d(i) 


f  \ 

Ml\  V 


(14) 


Pk^ijXj  ^  +  (1  -Pk)4 - — - >  fly  0 


(15) 


0, 


aij=0 


Finally,  consider  the  multiplier  update  (8).  For  (z,y)  such  that  a^j  =  0,  one  has  simply  pk+i  =  pk. 
For  aij  5*  0, 


pt'  =  pI  +  ^PiPiJ^*‘  +(i-Pt)4~  4*’) 

=4^H-ilph4*']} 

=-4*'  ■ 


It  is  therefore  possible  to  eliminate  {p*}  from  the  algorithm  in  favor  of  the  lower-dimensional 
sequence  {#}  cSR",  obtaining 


0 , 


°ij 


=  0 
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from  (15).  It  now  turns  out  that  {z*}  can  also  be  replaced  by  a  lower-dimensional  sequence 
as  follows: 


Lemma  1.  Assume  that  tP  is  of  the  form 


fly-yy 

0, 


niy^) 

dii)  ’ 


Oij^AO 

fly  =0 


for  some  yO  e  91".  Define  the  sequence  {y*}  via  the  recursion 

yk+l  -  (l-pi^)yk  +  p^\k+l  k>0  . 

Then  for  all  /:  >  0, 


%yj  - 
0, 


d(i)  ’ 


fl,y  9^0 

fljy  =  0  . 


(16) 


Proof.  Assuming  (16)  holds,  we  have,  when  aij  ^  0, 

=  fly((l  -  pk)yjk  +  pkx/+i)  -  (bi  -  <A/,  y*+l» 

=  fl(l-Pi)flyy/  +  p*flyx/+l  -  ■3jjj((l-p^(b/-<A/,y*»  +  Pi(P,-<A/,x*+l») 
=  a(l-p0[agy/-  ^rKy^)]  +  Pjfe<tyx/+l  -  ^r,(x*+l) 

=  fl(l-p^)z,/  +  pkaijx/^+^  -  ^r,(x^+l) 


So,  (16)  holds  with  k  replaced  by  ^+1,  and  the  claim  follows  by  induction.  I 
In  view  of  Lemma  1,  the  method  (6)-(8)  reduces  to 
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Ml 


~  aigmin 

£k  Xj 


OyXj- 


v.Oij^O 


V 


d(}) 


J) 


d{i) 


(17) 


y*'*'^=Pjtx*‘"^+(i-Pik)y^  • 

Collecting  terms  in  the  squared  expressions  and  assuming  A  has  no  all-zero  columns,  one  obtains 


)t+l 

Xj  *  argimn 

£k  Xj 


hj{Xj )  (a.jj  Ji^^Xj  +  2  ||a.j 


^J- 


'  '  d(i)  ‘ 

y^‘^^=p^x^‘^^-h(l-P;fc)y*  , 


(18) 


where  y^  e  SH"  and  nP  €  SH"*  are  completely  arbitrary.  One  may  call  this  algorithm  an  alternating 
step  method  because  of  the  alternating  updates  to  the  primal  variables  and  y^  and  dual  variables 
#.  The  updates  of  the  primal  variables  xy  andyy  are  completely  independent  over  j,  and  the 
updates  of  the  dual  variables  iCi  are  likewise  independent  over  i.  Thus,  the  method  has  the  poten¬ 
tial  for  massive  parallelism. 


Now  consider  the  convergence  of  (18).  Recall  that  the  dual  of  the  monotropic  program  (9)  may  be 
written  [16,  Chapter  11] 

n 

miitimize^/iy^^A.y,Jc^j-b^JC  ,  (19) 

*e9i'"  j=i 

where  hf  denotes  the  convex  conjugate  of  hj. 

Theorem  2.  Let  { ek]  £  [0,  <»)  be  smnmable  and  suppose  0  <  inf  pt  <  sup pt  <  2.  If  A  has  no  iden- 

jk^O 

tically  zero  columns  and  both  the  monotropic  program  (9)  and  its  dual  (19)  have  optimal  solutions 
with  finite  objective  value,  then  any  sequences  {x^}  and  {#}  conforming  the  recursion  (18)  re- 
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spectively  converge  to  such  optimal  solutions.  If  (19)  has  no  optimal  solution,  then,  given  any  two 
sequences  {y*}  and  {#)  conforming  to  (18),  one  of  them  must  be  unbounded. 


Proof.  Suppose  Py  =  -ici  whenever  ^  0.  Then 


-M^p  = 


in 


aij^O 


= 


(20) 


;=1 


Applying  basic  convex  analysis  to  g,  we  find  that 


^g(2.)  = 


|p  e  I  Oij  0,flj7  0  =>  Pij  =  p,7  J  ,  z  €  C 
0  ,  otherwise 


^/(P) 


C  ,  Oij  ^  0,t^7  0  =>  Pij  —  Pii 

0  ,  Otherwise 


^*(p)  =  .  3*  e  91'" :  0  =>  pij  =  -Tti 

[+00  ,  otherwise  . 

In  view  of  (20),  it  follows  that  the  dual  problems  (5)  and  (19)  are  equivalent  in  the  sense  that 

/‘(-M'^p)  +  g*(p)  =  h*  ((a.;,*))  -  b'^jc 

whenever  pij  =  -;r,-  for  all  i,  and  that  if  py  5*  pn  for  any  i,  j,  and  I  such  that  and  an  are  nonzCTo, 
then  /  (-M  p)  +  g  (p)  =  oo .  Thus,  if  x*  and  n*  are  respectively  optimal  for  the  monotropic  pro¬ 
gram  and  its  dual,  then  x*  and  z*  =  Mx*  are  optimal  for  (1)  and  p*  defined  by  p,y  =  — tt*  is  opti¬ 
mal  for  (5),  so  (x*,  p*)  form  a  Kuhn-Tucker  pair  in  the  sense  of  Theorem  1.  Since  A  has  no  zero 
columns,  M  has  full  column  rank,  and  Theorem  1  with  =  Vne^  and  =  0  for  all  it  gives  con¬ 
vergence  of  {x*}  and  {p*}  in  algorithm  (6)-(8).  By  the  derivation  above  and  the  equivalence  of 
the  primal  and  dual  problems,  this  implies  the  convergence  of  {x*^}  and  {#}  to  their  respective 
optima  (note  that  nothing  is  said  in  this  case  about  convergence  of  (y*)).  When  the  monotropic 
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dual  (19)  does  not  have  a  finite  optimum,  then  the  dual  equivalence  already  shown  implies  (5)  has 
none  either,  and  one  can  again  invoke  Theorem  1.  ■ 

The  method  (17)  is  based  on  an  underlying  alternating  direction  iteration  (6)-(8)  in  which  some 
variables,  namely  (p*)  and  {z*},  have  dimension  mn.  However,  it  reduces  to  a  form  in  which  all 
the  variables  have  dimension  either  main,  h  convergence  proof  involving  only  the  lower-dimen¬ 
sional  sequences  would  be  very  appealing,  but  so  far  none  has  been  found. 


Consider  now  the  (con^letely  general)  linear  programming  problem 

minimize  c^x 
subject  to  Ax  =  b 

1<X<U  , 

where  c  e  91",  b  e  9?"*,  1  €  [-«>o,  oo)n,  u  €  (-o®, «»]",  and  A  is  an  m  x  n  real  matrix.  This  can  easily 
be  converted  to  the  form  (9)  by  combining  the  constraints  1  <  x  ^  u  with  the  objective  function.  It 
then  turns  out  that  the  variables  of  (19)  are  the  usual  linear  programming  dual  variables.  The  min¬ 
imization  step  of  (18)  may  fiien  be  done  exactly  in  closed  fonn,  and  the  method  reduces  after 
some  algebra  to 


=n.ax 

[  1  1  ^ 

d(i) 


Setting  Pi  =  1  for  all  yk  yields  the  even  simpler  method 
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Here,  each  primal  variable  xy  is  first  adjusted  by  terms  proportional  to  its  reduced  cost 
Cj  —  and  the  amount  of  violation  of  constraints  in  which  it  is  involved.  Then,  it  is  pro¬ 

jected  onto  its  valid  range  [/y,  Wy]  n  91.  After  this,  each  dual  variable  is  adjusted  proportionally  to 
any  remaining  constraint  violation,  and  the  process  repeats.  Primal  feasibility,  dual  feasibility,  and 
complementary  slackness  are  not  maintained;  instead,  all  are  gradually  satisfied  as  the  algorithm 
progresses  towards  a  fixed  point  of  the  recursions  (21)  or  (22).  The  algorithm  is  highly  parallel; 
indeed,  the  only  need  for  communication  between  processors  responsible  for  the  various  primal 
and  dual  variables  in  the  computation  and  dissemination  of  the  surpluses  rj(x^). 

Linear  programming  algorithms  of  this  type  are  also  known  to  have  linear  convergence  rate  — 
see  [5]  and  [3].  However,  they  were  tested  in  [3]  and  found  to  converge  very  slowly  in  practice  on 
the  majority  of  NETGEN-generated  [12]  minimum  cost  flow  problems.  However,  (18)  may  also 
be  done  in  closed  form  when  the  cost  function  in  quadratic,  and  for  NETGEN-like  quadratic  prob¬ 
lems,  computational  tests  have  been  quite  encouraging;  see  [6]  for  details  of  both  the  specializa¬ 
tion  of  the  algorithm  and  its  performance. 

4.  Block-Separable  Problems:  an  Epigraphic  Projection  Method 

Let  us  now  turn  to  a  more  general  class  of  convex  programs,  as  studied  in  [18,  Section  4].  Let 
{ be  partitioned  into  d>2  nonempty  subsets  A//,y=l,...,d,  and  let  nj  =  \Nj\  for  all  j.  For  any 
X  e  91'*,  let  Xj  e  91”/  denote  the  subvector  of  x  with  components  Xq,  qe  Nj.A  convex  program  is 
called  block-separable  if  for  some  such  partition,  it  takes  the  form 

d 

minimize 

C3) 

subject  to  5^4y(xy)<0  /  =  l,...,m  , 

y=i 

where  the  hij  are  convex  functions  on  91”/  for  /=0,...,m  and  To  simplify  the  analysis, 

assume  that  the  hy  are  finite  on  91”f  for  j  >  1.  For  the  h^j,  assume,  slightly  more  generally  than  in 
[18],  that 
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(24) 


ihojiXj)  XjsCj 
1+00  Xj  £  Cj  , 


where  the  Hoj  are  finite  and  convex  throughout  SR";,  and  the  Cj  q,  are  closed  convex  sets.  Let 
C  =  Cl  X ...  X  Cd  £  SR". 


We  start  by  expressing  (23)  as  a  minimum  problem  for  a  sum  of  three  convex  functions.  First, 
some  notation:  let  the  components  of  vectors  u  €  be  written  uy,  for  and 

Similarly,  denote  the  components  of  vectors  z  €  91"*"  by  ziq,  for  i=l,...,m  and  ^=l,...,n,  and  let  H 
be  the  mn  x  n  matrix  taking  vectors  x  €  SR"  to  z  e  SR"*",  where  ziq  =  Xq  for  all  i  and  q.  That  is. 


m  times . 


For  any  z  e  SR"*",  let  Zy  denote  the  subvector  of  z  consisting  of  the  zjq,  q  e  Nj.  Now  define  func¬ 
tions 


d 

ii:x€91”  I-+  '^hojixj) 


7=1 


F2:  u  e  SR"*^  I-+ 


0, 

+00, 


d 

^Uij  =0  V  /  = 
7=1 

otherwise 


fy  (z,u)  6  X sR”^  h+  ^  ^  “y  ^  ^  ~ 

1+00 ,  odierwise  . 


Fj,  F2,  and  F3  are  closed  proper  convex.  Furthermore,  the  problem 


minimize  Fj  (x)  +  F2  (u)  +  7^  (Hx,  u)  (25) 

xe9l" 
ueSt"*^ 
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is  equivalent  to  (23)  in  that  x  is  feasible  with  finite  objective  value  in  (23)  if  and  only  if  there 
exists  u  e  such  that  Fi(x)  +  f^(u)  +  /^(Hx,u)  is  finite,  in  which  case 

d 

F\  (x)  +  F2  (u)  +  F3  (Hx,  u)  =  Y,  hj  (*;• )  • 

j=l 

Now  consider  how  to  convert  (25)  into  the  canonical  form  (4).  One  approach  would  be  to  set 

/(x,u)  =  Fi(x)  +  F3(Hx,u) 

^(x,u)  =  F2(u) 

and  let  M  bet  the  (n+m^O-dimensional  identity  matrix.  Applying  algorithm  (6)-(8)  and  setting 
=  ^  for  simplicity  then  yields  (after  considerable  algebra)  the  recursions 


„k+l 


=  argnm  *o;(Xy)+|.||x^ 


'y  TTl 

/  =  1 


(26) 


=  max{v^-  +  ,  /^y(x5+i)]  (27) 

+  (28) 


,it+l 

ij 


=  Pjt“iy^^  +  (l-Pifc)v| 


Pk 

d 


i=i 


(29) 


Here,  (x^)  and  {y^}  are  in  9i«,  {u*}  and  {v^}  are  in  with  Xf=i^*V  =  ®  is  “ 

SR"*.  Further  setting  =  1  for  all  ^  and  eliminating  (y*)  yields  the  decomposition  method  pro¬ 
posed  by  Spingam  [18,  Section  4].  Spingam’s  derivation  is  different,  being  based  on  the  notion  of 
partial  inverse  of  a  monotone  operator  [17,18],  but  the  fundamental  principle  is  the  same  as 
that  behind  the  alternating  direction  method  of  multipliers,  as  pointed  out  in  [4], 


One  problem  with  the  method  (26)-(30)  is  that  the  minimand  in  (26)  may  possess  a  very  compli- 
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cated  pattern  of  nondifferentiable  points,  mitigating  against  the  use  of  standard  unconstrained 
methods  to  obtain  .  Fundamentally,  this  difficulty  arises  because  (23)  has  not  been  decom¬ 
posed  “vertically,”  that  is,  with  respect  to  i,  even  though  it  has  been  decomposed  “horizontally,” 
that  is,  with  respect  to  j. 

An  alternative  approach  to  converting  (23)  to  the  form  (4)  is  to  combine  Fj  and  F2,  as  opposed  to 
Fj  and  F3.  Because  Fj  depends  only  on  x,  and  F2  depends  only  on  u,  one  thus  preserves  the  struc¬ 
ture  of  the  three-way  splitting  embodied  in  (25),  and  the  resulting  algorithm  achieves  a  more  thor¬ 
ough  decomposition  of  the  problem.  We  let 


/(x,u)  =  Fi(x)  -1-  F2(u)  =  ^ 

'  d 

y=i 

d 

2^u,y  =  0  V  i  =  l,...,m 

7=1 

(31) 

+00, 

otherwise 

TH  1 

M=  1 

(32) 

g(z,u)=F3(z,u) 

. 

(33) 

In  (32),  M  is  of  dimension  (n+md)  x  (nin+md),  and  has  full  column  rank  from  the  definition  of  H. 
Under  (31)-(33),  problem  (4)  is  equivalent  to  (25),  and  hence  to  (23). 

We  now  apply  the  alternating  direction  method  (6)- (8).  The  role  of  the  “x^”  variables  will  be 
played  by  pairs  (x*,  u^)  e  x  the  place  of  the  “z^’  variables  will  be  taken  by  pairs 
(yi  yl:)  e  X  and  the  role  of  the  multipliers  “p^’  will  be  assumed  by  pairs 

(P^»  q*)  €  9?'””  X  By  the  separability  of  x  and  u  in  (31),  (6)  decomposes  immediately  into 

x^'*’^  =  argmin|/^(x)  -i-  ^p^,Hx^  +  yIhx  -  z^||  |  (34) 

~  arg  min|F2  (u)  +  -I-  -l-llu  -  v*  ^  |  ,  (35) 

where  •\jek+Sl  In  view  of  the  structure  of  FiandF2,  (34)  and  (35)  may  be  satisfied  by 
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setting 


■  »V  w  , 


X.-695  V=1  /  ,=1 


^  d  ^  1  Z'  d  ^ 

^J  ~  ^ij  2^^il  2  ^ij~2j^il 


Next,  one  must  apply  (7),  which  now  decomposes  into  md  calculations  of  the  form 


*»  argmin 

Zy69l”/ 


Sij » Vi/ )  (py  > Zj;  ^  +  \  ^ij  —  +  (1  —  P)t  )zj^- jlj 


+  T(vy-(pi4''^  +  (l-pJk)v|))^|  . 


Completing  the  square  and  rewriting, 


«  argmm 

v*+lj  hij(zij)<vij 

^  -Jmd 


that  is,  one  must  project  the  point 


ih  ‘ +<>-ft)4+ip||f 

+  ^'’ij  -  {pt4*' + (1  -Pt)'! + j«|j)' 


Ptx)+‘  +  Cl-pt)z|  +  -Jp| 

Pi4*'+Q-pt)4*i4 


fxfM  fzl'l  (fl 

+x  t 

y  vv  J  Wy 


onto  the  epigraph 


cpihij  =  |(x y,v)  e  hijixj)  <  v|  . 


Lemma  2.  Suppose  that  ^  SR  is  convex  and  everywhere  finite.  Then  the  projection  (y*,  v*) 
of  any  (x,  m)  €  SR^  x  SR  onto  epi  h  can  be  accomplished  by  the  following  procedure: 
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if  h(x)<u 

then  (y*,  v*)  :=  (x,  u); 
else  begin  ; 

y*  :=argminy  {||y-x||2  +  (/i(y)-«)2}; 

V*  :=  h(y*); 

end; 

Proof,  ff  /i(x)  <  u,  then  (x,  u)  e  epi(A),  so  (y*,  v*)  :=  (x,  u).  Otherwise,  (x,  u) «  epi(A).  Being  con¬ 
vex  and  finite  everywhere,  A  is  a  continuous  function,  therefore  rjiy,  v)  =  h(y)  -  v  is  continuous. 
Now,  epi  h=  {(y,v)  |  Tj(y,  v)  <  0},  so  int(epi  h)  =  {(y,v)  |  r](y,  v)  <  0}  and  the  boundary  of  epi  h, 
bd(epi  h),  is  {(y,  v)  |  T](y,  v)  =  0}.  The  projection  of  (x,  u)  onto  epi(A)  must  lie  in  bd(epi  h),  and 
therefore  must  be  of  the  form  (y,  h(y)).  The  suggested  procedure  minimizes  the  distance  between 
(x,  u)  and  points  of  this  form.  ■ 

In  general,  let  the  notation  epp(A,  (x, «))  stand  for  the  projection  of  (x,  u)  onto  epi  h.  Note  that  for 
general  convex  functions  taking  the  value  +  <»,  the  procedure  of  Lemma  2  should  be  replaced  by 


y*  :=  argmin 


in{|iy-x|p  -»-max^{0,/i(y)-M}j 


V*  max{M,/i(y*)}  . 

Here,  such  a  situation  is  precluded  by  the  assumptions  on  (23).  In  view  of  (36),  (37),  and  (38),  (6)- 
(8)  now  reduce  to  the  following  recmsions: 


n  xyeStq  \i=i  /  i=i 


4"'^  =  fv|  -  X  vj  1- 


]  \  ( + (x-pk)4 +jPij 

—  epp  hi. 


-17- 


(42) 


= Pij + +o^-pk  )4j  -  4/^) 

= 4 + ^(pic4^^ + (1  -  Pi  )4j  -  ’  (43) 

where  {0^}  and  {0^}  are  nonnegative,  summable  sequences.  This  method  will  be  called  an  epi- 
graphic  projection  method  because  of  step  (41).  The  problem  has  been  more  thoroughly  decom¬ 
posed  than  in  the  method  (26)-(30)  in  that  no  subproblem  involves  more  than  one  function  /ly. 
Step  (41),  in  particular,  decomposes  into  md,  as  opposed  to  d,  separate  tasks,  affording  a  large 
potential  for  parallelism.  Fiuthermore,  if  all  the  hij  are  smooth,  then  all  subproblem  minimands, 
including  those  needed  to  implement  (41),  are  smooth.  By  comparison,  the  minimand  in  (26)  will 
in  general  be  nonsmooth,  even  if  all  the  hij  are  smooth. 

In  some  cases,  it  may  be  possible  to  carry  out  the  epigraphic  projections  (41)  exactly;  in  other 
cases,  they  might  have  to  be  performed  approximately  to  some  accuracy  ^  >  0.  In  the  subroutine 
outlined  in  Lemma  2,  this  involves  finding  some  y  -  y  *  such  that  h{y)  ~  h(y  * ),  where 
■yjoci^+Pl^  —  Guaranteeing  such  conditions  is  potentially  complicated  in  the  general  case,  so 
we  will  not  discuss  this  issue  any  further  here;  however,  working  out  similar  approximation  con¬ 
ditions  for  (26)-(27)  seems  much  more  complicated,  due  to  coupling  between  /i,y.  See  [15]  for 
some  exemplary  approximate  solution  criteria. 

Consider  now  the  convergence  of  (39)-(43).  The  proof  requires  a  Slater  condition  similar  to  that 
of  [18],  but  modified  to  accommodate  the  more  general  hQj  of  (24). 

Theorem  3.  Consider  the  problem  (23),  where  the  hy  are  everywhere  finite-valued  convex  for 

i  =  l,...,/n,  and  the  hoj  are  as  in  (24).  Suppose  { and  { dh-}  are  nonnegative  summable  sequences, 

0  <  inf  Pj^  <  sup  Pk<^>  anti  the  following  Slater  condition  is  met: 
it>0  k>0 

d 

3xeri(dom/it)i)x...xri(dom;j0j):  V/  =  l,...,m,  ^/^y(xy)<0  .  (44) 

y=i 
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Then,  if  (23)  has  an  optimal  solution  with  finite  objective  value,  the  sequence  {x*}  produced  by 
the  epigraphic  projection  method  (39)-(43),  with  arbitrary  initial  conditions,  will  converge  to  a 
solution  of  (23). 

Proof.  First  considCT,  under  (3 1  )-(33),  the  problem 

minimize  /(x,u)+g(M(x,u))  ,  (45) 

(x,u)e9l"x9l'^ 

which  is  of  form  (4).  If  this  problem  has  a  Kuhn-Tucker  point,  then,  in  view  of  the  above  deriva¬ 
tion,  setting  jUjt  =  in  Theorem  1  gives  the  desired  result  So,  it  suffices  to 

show  (45)  has  a  Kuhn-Tucker  point 

Suppose  X*  is  optimal  for  (23).  Then  there  must  exist  a  u*  €  such  that  (x*,  u*)  is  optimal  for 

(25)  and  hence  for  tiie  equivalent  problem  (45),  that  is 

(0,0)€(9[/-i-goM](x*  u*)  . 

Suppose  it  were  true  that 

^[/  +  ^  °  M](x,u)  =  ^(x,u)  +  M^^g(M(x,u))  V(x,u)  e  9?"  x  .  (46) 

Then  there  would  have  to  exist  (p*,q*)  e  ^g(M(x*,u*))  such  that  -M^(p*,q*)  €  ^(x*,u*), 
and  ((x*,u*),(p*,q*))  would  be  a  Kuhn-Tucker  pair  for  (45).  So,  it  suffices  to  prove  (46). 

Define  the  linear  space 


c/  =  < 

d 

2)i^y  =  0Vj  =  l,...,/n^ 

so  that 

Ti(dom/)  =  ri(domFi)  x  ri(dom  F2)  =  (ri(dom  koi)  x ...  x  ri(dom  hod))  x  U  . 
For  i  =  l,...,m,  let 
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d 

7=1 

and  define  ueSH'^via  %  = /iy(xy)  +  ^.Then,  u  6  £/,  and  (x,u)eri(dom/).  Now, 
dom(^oM)  =  {(x,u)  I  hijixj)-uij  ^  0,i=  l,.../n,j=l,...4)  . 

The  hij,  i  =  are  finite  and  convex,  hence  continuous,  so  the  functions 

%(x>  Uij)  =  hijixj)  -  Uij 

are  also  continuous.  For  all  i  and  y, 

77y(xy,U,y)  =  hij{Xj)  -  ^hij(Xj)  +  -^^  =  -.1  <  0  . 

It  foUows  that  there  is  an  open  neighborhood  of  (x,  u)  that  is  contained  in  dom(g » M),  and  hence 
that 

(x,u)g  int(dom(goM))  =  ri(dom(goM))  . 

Therefore,  (x,u)  e  ri(dom/)  n  ri(dom(goM)),  and 


9[f  + go M](x, u)  =  df(x, u)  +  o M](x, u)  V (x, u) e SR"  x SR""^ 

by  [14,  Theorem  23.8].  Similarly, 

M(x,u)  €  int(domg)  =  ri(domg)  , 


so 


c{g  o  M](x,u)  =  M^^g(M(x,u))  V (x,u)  e  SR"  x  SR"*^ 
by  [14,  Theorem  23.9];  (46)  is  thus  established.  ■ 

Following  Theorems  23.8  and  23.9  of  [14],  alternatives  to  the  condition  (44)  are  possible  when 
some  or  all  of  the  hij  are  polyhedral. 


5.  Conclusion 


This  paper  has  attempted  to  demonstrate  by  example  the  power  and  generality  of  alternating 
direction  multiplier  algorithms  for  decomposing  convex  programs,  and  thus  parallelizing  their 
solution.  The  approach  given  here  subsumes  Spingam’s  partial  inverse  technique  [17,18];  for  an 
explanation,  see  [4]  or  [3].  The  key  step  in  the  alternating  direction  multiplier  approach  to  decom¬ 
position  is  careful  conversion  of  the  problem  of  interest  into  the  form  (1)  or  (4).  The  way  this  con¬ 
version  is  done  can  strongly  influence  the  degree  of  decomposition  attained,  as  evidenced  by  the 
comparison  of  method  (26)-(30)  to  the  epigraphic  projection  procedure  (39)-(43). 


So  far,  it  is  not  easy  to  tell  when  methods  derived  in  this  manner  will  perform  well  in  practice.  For 

example,  the  methods  of  Section  3  are  impractical  for  linear-cost  problems,  but  competitive  for 

quadratic  ones.  Perhaps  future  research  will  resolve  such  questions. 
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